Skip to content

(multiple) Add roles for backup/restore functionality#3886

Open
abays wants to merge 3 commits into
openstack-k8s-operators:mainfrom
abays:backup_restore3
Open

(multiple) Add roles for backup/restore functionality#3886
abays wants to merge 3 commits into
openstack-k8s-operators:mainfrom
abays:backup_restore3

Conversation

@abays
Copy link
Copy Markdown
Contributor

@abays abays commented Apr 23, 2026

Add three new Ansible roles for OpenStack on OpenShift backup and
restore using OADP (OpenShift API for Data Protection) and Velero:

  • cifmw_backup_restore: orchestrates backup, restore, and cleanup of OpenStack control plane and data plane resources, including Galera database dumps, Velero CSI volume snapshots, and ordered multi-phase restore sequences.

  • openshift_adp: installs and configures the OADP operator with an S3-compatible storage backend, creates the DataProtectionApplication CR, sets up VolumeSnapshotClass for CSI snapshots, and verifies the BackupStorageLocation is available.

  • deploy_minio: deploys MinIO as a lightweight S3-compatible object store for use as the Velero backup target in development and CI environments.

Also adds playbooks (backup_restore.yaml, backup_restore_tasks.yaml)
to integrate backup and restore into the post-deployment pipeline.

Jira: https://redhat.atlassian.net/browse/OSPRH-22913
Jira: https://redhat.atlassian.net/browse/OSPRH-29819
Jira: https://redhat.atlassian.net/browse/OSPRH-30021

Signed-off-by: Andrew Bays abays@redhat.com
Signed-off-by: Martin Schuppert mschuppert@redhat.com

@abays abays requested review from danpawlik and evallesp April 23, 2026 13:27
@abays abays force-pushed the backup_restore3 branch from 9113297 to bd28a16 Compare April 23, 2026 13:42
@abays abays changed the title [cifmw_backup_restore] Add role for backup/restore functionality (multiple) Add roles for backup/restore functionality Apr 23, 2026
@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/53bd5aeecb3c49f2a2b055441269b372

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 19m 39s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 21m 43s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 28m 19s
✔️ cifmw-crc-podified-edpm-baremetal-minor-update SUCCESS in 2h 07m 15s
cifmw-pod-zuul-files FAILURE in 4m 38s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 9m 00s
cifmw-pod-pre-commit FAILURE in 8m 03s

@abays abays force-pushed the backup_restore3 branch from bd28a16 to f895e3a Compare April 24, 2026 11:59
@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/bdf74b0181114163974bebdac49d4001

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 14m 41s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 26m 18s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 38m 13s
✔️ cifmw-crc-podified-edpm-baremetal-minor-update SUCCESS in 2h 01m 11s
cifmw-pod-zuul-files FAILURE in 5m 06s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 8m 53s
cifmw-pod-pre-commit FAILURE in 8m 59s

@abays
Copy link
Copy Markdown
Contributor Author

abays commented Apr 30, 2026

recheck

Zuul seems to be stuck

@centosinfra-prod-github-app
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/rdo/buildset/f256b54bd75545b094a3e13373e18b4f

✔️ openstack-k8s-operators-content-provider SUCCESS in 12m 27s
podified-multinode-edpm-deployment-crc RETRY_LIMIT in 31s
cifmw-crc-podified-edpm-baremetal RETRY_LIMIT in 29s
cifmw-crc-podified-edpm-baremetal-minor-update RETRY_LIMIT in 28s
✔️ cifmw-pod-zuul-files SUCCESS in 4m 58s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 9m 20s
✔️ cifmw-pod-pre-commit SUCCESS in 8m 51s
✔️ cifmw-molecule-cifmw_backup_restore SUCCESS in 2m 11s
✔️ cifmw-molecule-deploy_minio SUCCESS in 2m 06s
✔️ cifmw-molecule-openshift_adp SUCCESS in 2m 12s

Comment thread playbooks/backup_restore.yaml Outdated
Comment thread playbooks/backup_restore.yaml Outdated
Comment thread playbooks/backup_restore.yaml Outdated
Comment thread playbooks/backup_restore.yaml Outdated
Comment thread playbooks/backup_restore.yaml Outdated
Comment thread roles/cifmw_backup_restore/tasks/backup.yml Outdated
Comment thread roles/cifmw_backup_restore/tasks/backup.yml
Comment thread post-deployment.yml Outdated
@abays abays force-pushed the backup_restore3 branch from f0f66d5 to ab71e14 Compare April 30, 2026 19:32
@centosinfra-prod-github-app
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/rdo/buildset/d48f5275a9364111ad351b6439ab0207

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 04m 50s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 19m 28s
cifmw-crc-podified-edpm-baremetal RETRY_LIMIT in 27s
cifmw-crc-podified-edpm-baremetal-minor-update RETRY_LIMIT in 26s
✔️ cifmw-pod-zuul-files SUCCESS in 5m 26s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 9m 51s
✔️ cifmw-pod-pre-commit SUCCESS in 9m 01s
✔️ cifmw-molecule-cifmw_backup_restore SUCCESS in 2m 07s
✔️ cifmw-molecule-deploy_minio SUCCESS in 2m 06s
✔️ cifmw-molecule-openshift_adp SUCCESS in 2m 10s

@abays
Copy link
Copy Markdown
Contributor Author

abays commented Apr 30, 2026

recheck

@centosinfra-prod-github-app
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/rdo/buildset/c0b3b5d7b7144c5980e3796d9b7cd536

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 37m 00s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 22m 48s
cifmw-crc-podified-edpm-baremetal RETRY_LIMIT in 28s
cifmw-crc-podified-edpm-baremetal-minor-update RETRY_LIMIT in 26s
✔️ cifmw-pod-zuul-files SUCCESS in 6m 39s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 11m 14s
✔️ cifmw-pod-pre-commit SUCCESS in 10m 45s
✔️ cifmw-molecule-cifmw_backup_restore SUCCESS in 2m 17s
✔️ cifmw-molecule-deploy_minio SUCCESS in 2m 25s
✔️ cifmw-molecule-openshift_adp SUCCESS in 2m 22s

@abays
Copy link
Copy Markdown
Contributor Author

abays commented May 1, 2026

recheck

@centosinfra-prod-github-app
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/rdo/buildset/c7cbbb84a46548b99d6313dc7900bc41

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 35m 46s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 22m 58s
cifmw-crc-podified-edpm-baremetal RETRY_LIMIT in 27s
cifmw-crc-podified-edpm-baremetal-minor-update RETRY_LIMIT in 31s
✔️ cifmw-pod-zuul-files SUCCESS in 5m 27s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 9m 21s
✔️ cifmw-pod-pre-commit SUCCESS in 9m 04s
✔️ cifmw-molecule-cifmw_backup_restore SUCCESS in 2m 06s
✔️ cifmw-molecule-deploy_minio SUCCESS in 2m 02s
✔️ cifmw-molecule-openshift_adp SUCCESS in 2m 03s

@evallesp
Copy link
Copy Markdown
Contributor

evallesp commented May 4, 2026

recheck

@centosinfra-prod-github-app
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://gateway-cloud-softwarefactory.apps.ocp.cloud.ci.centos.org/zuul/t/rdo/buildset/fc994363bc5242f7b7cfae5a58a70d98

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 38m 34s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 26m 21s
cifmw-crc-podified-edpm-baremetal RETRY_LIMIT in 27s
cifmw-crc-podified-edpm-baremetal-minor-update RETRY_LIMIT in 27s
✔️ cifmw-pod-zuul-files SUCCESS in 5m 04s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 8m 55s
✔️ cifmw-pod-pre-commit SUCCESS in 8m 24s
✔️ cifmw-molecule-cifmw_backup_restore SUCCESS in 2m 28s
✔️ cifmw-molecule-deploy_minio SUCCESS in 2m 31s
✔️ cifmw-molecule-openshift_adp SUCCESS in 2m 34s

@abays
Copy link
Copy Markdown
Contributor Author

abays commented May 5, 2026

recheck

@evallesp
Copy link
Copy Markdown
Contributor

evallesp commented May 6, 2026

Role prefix job status is FAILED while that should be green. Currently the job looks for just last commit but checks all the files that the PR contain.

Fixing at: #3903

@evallesp
Copy link
Copy Markdown
Contributor

evallesp commented May 8, 2026

recheck

@evallesp
Copy link
Copy Markdown
Contributor

evallesp commented May 8, 2026

Once rebase, the CI failure should be fixed.

@stuggi
Copy link
Copy Markdown
Contributor

stuggi commented May 13, 2026

@evallesp CI passed. Are we good to get this landed? we'd need it for the next FR

@stuggi
Copy link
Copy Markdown
Contributor

stuggi commented May 13, 2026

rebased

@stuggi stuggi force-pushed the backup_restore3 branch from ae96e2e to 2c20efd Compare May 13, 2026 09:39
dest: "{{ _deploy_minio_rendered_dir.path }}/minio.yaml"
mode: "0644"

- name: Apply MinIO manifests
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(blocking) suggestion: I'd rather go by using: kubernetes.core.k8s instead of shell.

@@ -0,0 +1,74 @@
---
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(non-blocking) suggestion: Marking as non-blocking as it can be done in a following PR. But we need here a README.

Comment thread roles/openshift_adp/tasks/main.yml Outdated
changed_when: true

- name: Wait for Velero pod to be ready
ansible.builtin.shell: |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(blocking) suggestion: let's use: kubernetes.core.k8s.
(non-blocking) suggestion: I'd go by checking first if pod > 0.

delay: 10
until: _operator_wait.rc == 0

- name: Create cloud credentials secret
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(blocking) concern: I'm unsure here. At least I see it's important adding no_log: true

Comment thread roles/openshift_adp/tasks/main.yml Outdated
when: cifmw_openshift_adp_enable_node_agent | bool

- name: Get OADP pods
ansible.builtin.shell: |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(blocking) suggestion: let's use: kubernetes.core.k8s.

Comment thread roles/openshift_adp/tasks/main.yml Outdated
# VolumeSnapshotClass for CSI snapshots
# ========================================
- name: Check for existing VolumeSnapshotClass
ansible.builtin.shell: |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(blocking) suggestion: let's use: kubernetes.core.k8s.

Comment thread roles/openshift_adp/tasks/main.yml Outdated
changed_when: true

- name: Create Subscription for OADP operator
ansible.builtin.shell: |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(blocking) suggestion: let's use: kubernetes.core.k8s.

Comment thread roles/openshift_adp/tasks/main.yml Outdated
changed_when: true

- name: Create OperatorGroup for OADP
ansible.builtin.shell: |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(blocking) suggestion: let's use: kubernetes.core.k8s.

Comment thread roles/openshift_adp/tasks/main.yml Outdated
- "Node Agent (Kopia): {{ cifmw_openshift_adp_enable_node_agent }}"

- name: Create OADP namespace
ansible.builtin.shell: |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(blocking) suggestion: let's use: kubernetes.core.k8s.

delay: 10
until: _velero_wait.rc == 0

- name: Wait for node-agent pods to be ready
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(blocking) suggestion: Let's remove failed_when: false
(non-blocking) suggestion: I'd go by checking first if pod > 0.

register: _s3_api_url
changed_when: false

- name: Create DataProtectionApplication
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(blocking) suggestion: let's use: kubernetes.core.k8s.

abays and others added 2 commits May 13, 2026 07:13
Deploy MinIO as a lightweight S3-compatible object store for use
as the Velero backup target in development and CI environments.

Signed-off-by: Andrew Bays <abays@redhat.com>
Signed-off-by: Martin Schuppert <mschuppert@redhat.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Install and configure the OADP (OpenShift API for Data Protection)
operator with an S3-compatible storage backend, create the
DataProtectionApplication CR, set up VolumeSnapshotClass for CSI
snapshots, and verify the BackupStorageLocation is available.

Signed-off-by: Andrew Bays <abays@redhat.com>
Signed-off-by: Martin Schuppert <mschuppert@redhat.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@abays abays force-pushed the backup_restore3 branch from 2c20efd to bb86df8 Compare May 13, 2026 11:16
Copy link
Copy Markdown
Contributor

@evallesp evallesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general looks good, but I'd try to move away .shell to kubernetes.k8s.* when possible.
Also I see some changed_when: true that might be checking a when clause.

for backup/restore. Without it, user-provided resources (e.g. osp-secret)
will not be restored.
Create an OpenStackBackupConfig CR before running backup.
when: _backupconfig_check.rc != 0 or _backupconfig_check.stdout == ""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(non-blocking) suggestion: I think is ok. Model review said we might want also check for .status.conditions

@@ -0,0 +1,14 @@
---
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(non-blocking) question: is this name intended to be 06a?

- name: Create OADP PVC backup
ansible.builtin.shell: |
oc apply -f {{ _cifmw_backup_restore_rendered_dir.path }}/backup-pvcs.yaml
changed_when: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(non-blocking) suggestion: I think we should add a better when clause here by checking oc apply output.

- name: Create OADP resources backup
ansible.builtin.shell: |
oc apply -f {{ _cifmw_backup_restore_rendered_dir.path }}/backup-resources.yaml
changed_when: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(non-blocking) suggestion: I think we should add a better when clause here by checking oc apply output.

- name: Delete DataPlaneDeployment CRs
ansible.builtin.shell: |
oc delete openstackdataplanedeployment --all -n {{ cifmw_backup_restore_namespace }}
changed_when: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(non-blocking) suggestion: I think we should add a better when clause here by checking oc apply output. (For all Delete here)

@abays abays force-pushed the backup_restore3 branch from bb86df8 to 8988fd7 Compare May 13, 2026 12:01
when: cifmw_backup_restore_cleanup_dataplane | bool

- name: Delete DataPlaneNodeSet CRs
ansible.builtin.shell: |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(blocking) suggestion: DYT it's possible to move away from shell in the deletes?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed many of them based on agent's recommendation. There are some that still use shell. If you'd like the rest converted, I will look into it.

Orchestrate backup, restore, and cleanup of OpenStack control plane
and data plane resources, including Galera database dumps, Velero CSI
volume snapshots, and ordered multi-phase restore sequences.

Also adds playbooks (backup_restore.yaml) and integrates backup and
restore into the post-deployment pipeline.

Signed-off-by: Andrew Bays <abays@redhat.com>
Signed-off-by: Martin Schuppert <mschuppert@redhat.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Martin Schuppert <mschuppert@redhat.com>
@abays abays force-pushed the backup_restore3 branch from 8988fd7 to a7b9cfd Compare May 13, 2026 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants